Simon
Fraser University (SFU) - CZSaw
STUDENTS:
Dustin Dunsmuir, Simon Fraser University, dtd@sfu.ca
Saba Alimadadi, Simon Fraser University,
salimada@sfu.ca
Victor Chen, Simon Fraser University,
yvchen@sfu.ca
Eric Lee, Simon Fraser University, ela10@sfu.ca
Cheryl Qian, Purdue University, cherylq@sfu.ca
FACULTY:
John Dill, Simon Fraser University, dill@sfu.ca
Chris Shaw, Simon Fraser University, shaw@sfu.ca
Robert Woodbury, Simon Fraser University,
rw@sfu.ca
CZSaw is
a visual analytics tool for sense-making across text documents with extracted
entities that focuses on the analysis process. It uses a variety of flexible
data visualizations for different perspectives of the networks of people,
places, dates, etc. It records the analysis process and model, and visualizes
them in a history view and dependency graph respectively. The history view
provides quick access to past states in the analysis process. The dependency
graph allows quick rerunning of parts of the analysis process on new data. In
order to make this possible all semantically meaningful interactions are
captured in a script language which can be edited by an expert user for fine
control of their analysis process.
CZSaw was developed in the School of Interactive
Arts and Technology at Simon Fraser University by Victor Chen, Dustin
Dunsmuir, Eric Lee, Nazanin Kadivar, Cheryl Qian, John Dill, Chris D. Shaw, and
Rob Woodbury. A paper entitled “Capturing and Supporting the Analysis Process”
was presented at VAST 2009, and the presentation materials can be found on the vgtc website. For more
information on CZSaw, see CZSaw's
webpage.
Maxim Roy of the Natural Language Lab at SFU,
applied the entity extraction to the XML file that was then used by CZSaw. He
used Alias-i’s
Lingpipe system for named entity extraction. For numerical
entities he trained a new named-entity model from the MUC-7 news data corpus.
MUC7. 1996. Message Understanding Conference (MUC) 7. LDC Catalog
Id=LDC2001T02.
Video:
ANSWERS:
MC1.1: Summarize the activities that happened in each country with
respect to illegal arms deals based on a synthesis of the information from the
different report types and sources.
State the situation in each country at the end of the period (i.e. the
end of the information you have been given) with respect to illegal arms deals
being pursued. Present a hypothesis
about the next activities you expect to take place, with respect to the people,
groups, and countries.
Detailed Answer:
Because this mini challenge’s components were related, parts of
the data preparation are described in MC1.2. Results described here were
informed by MC1.2’s process. First we transformed the Word files into one XML
file using a custom program. Maxim Roy of SFU’s Natural Language Lab ran
software to extract entities (people, places, etc). The resulting file was then
ready to be loaded into CZSaw, after refining as described in MC1.2.
We used CZSaw’s semantic zoom view (SZV) to examine documents at
several levels of detail: overview, entities in the document, and detailed
text. We used a clustering algorithm which places document with many entities
in common close to each other as seen in Figure 1.
Figure 1- SZV with all 103 documents laid
out. Colored highlighting is applied based on date ranges in the side bar. |
This layout led us to the most
tightly clustered documents: telephone calls between the same few people in
Turkey and Syria. We viewed the documents by semantically zooming into them
(Figure 2), which distorts the overall layout like a fisheye, and then back out
which restores the original layout.
Figure
2- SZV view with a couple documents zoomed in part way. |
We grouped related documents using the manual group function. We
similarly grouped the other documents clustered around the same people and
countries (Figure 3). Scanning each document’s text and entities, and grouping
the results enabled us to create mutually exclusive groups containing all 103
documents, in approximately two hours. Brushing and searching aided this
categorization. Brushing entities in a document or group caused all documents
containing the same entity to be highlighted (Figure 3). The search feature
enabled searching within the text of all documents.
Figure
3- The SZV with all documents grouped showing the different tabs of a group
and brushing done on the Nahid entity. |
While creating groups from clusters, it became apparent almost all
documents were related to arms dealing in one of the countries; thus we had to
read them all. Four team members worked in parallel to summarize each group.
SZVgroups have tabs representing different views of the contained documents.
Team members used the text tab to read each contained document and the entity
tab to see the combined set of entities and perform brushing.
Customers
Based on a series of emails, we are highly confident George Ngoki of Nigeria
developed a fake government contract to arrange to buy weapons. Ngoki and many
others planned to travel to Dubai as outlined in Figure 4.
A set of phone conversations revealed to us that someone named
Baltasar in Syria is working together with Celik and Hakan in Turkey to
purchase “textbooks” for a school while at the same time Celik is purchasing
some “farming equipment”. Combining these two plans we can say with moderate
confidence that this group is really purchasing weapons.
Muhammad Kasem, leader of the Martyrs Front of Judea, is in a
conflict in the Gaza/West Bank area with Israel. From a variety of sources, we
can say with high confidence that they purchased weapons from an outside
country.
In Pakistan, the Lashkar-e-Jhangvi terrorist group uses weapons
such as explosives. The leadership of this group almost certainly contains
Azeem Bhutani and Maulana Haq Bukhari. A number of bank transactions from an
account believed to be owned by Bukhari, suspicious packages showing up at
Bukhari’s door and travel plans made for Dubai, make it likely that the
Lashkar-e-Jhangvi are buying arms.
In South America, phone conversations and message board posts give
us high confidence that a group of people in Carabobo and Barcelona, Venezuela,
are planning a weapons purchase. They are using Jhon based out of Medilin,
Colombia to organize this purchase. Bank transactions in Nov 2008 support this
hypothesis.
Arms Dealers: Intermediates
From newspapers
and blogs we have moderate confidence that the Ministry of Police (MP) in Kenya
is involved in the shipment of weapons from their own stores to Sudan forces.
Weapons are transported to the MP from Ukraine and one such shipment was
hijacked by pirates from Oct 2008 to March 2009. Due to their arrest and a
phone call, Thabiti Otieno and his wife Nahid Owiti are likely involved in
organizing the shipments by boat and then to Sudan. They arranged to be in
Dubai in April 2009 then died in Kenya on May 1st.
It is reported
that Saleh Ahmed is an arms dealer in Yemen and Saudi Arabia. Based on his
phone calls we confirm this with high confidence and hypothesize that he is
obtaining weapons from outside of the country. We have high confidence he
planned to be in Dubai in April 2009 to purchase weapons and then on May 3rd
he died in a hospital in Yemen.
Arms Dealers: Source
Based on the many meetings set for Dubai for the week of April 18th, we are
almost certain that the central arms dealers supplying everyone are located in
Russia, Ukraine and Thailand. Based on their phone calls, we have high
confidence that suspected arms dealers Nicolai Kuryakin and Boonmee Khemkhaeng
are selling the weapons. Boonmee is based out of Thailand and almost certainly
acted as a middle man setting up meetings with Nicolai in April 2009 for
weapons purchases. Mikhail Dombrovski is based in Moscow and is also very
likely involved in the selling of weapons to everyone else. Arkadi Borodinski
of Ukraine, an associate of Nicolai, is likely to have attempted transporting
illegal weapons to Iran. He likely arranged for Sattari Khurshid of Iran to meet
Nicolai. Leonid Minsky of Ukraine was involved in illegal arms dealing until
his death in February 2009. Figure 4 shows the meetings planned for Dubai in
April 2009 as shown in CZNotes, our underdevelopment note taking facility.
Figure
4- Meetings and events related to Dubai involving arms dealers. |
After examining the groups of documents, the existence of several
threads became apparent. We next investigated the social network connecting
these threads (Figure 5).
Figure 5- Social network of main people involved in arms dealing. |
MC1.2: Illustrate the
associations among the players in the arms dealing through a social
network. If there are linkages among
countries, please highlight these as well in the social network. Our analysts are interested in seeing
different views of the social network that might help them in
counterintelligence activities (people, places, activities, communication
patterns that are key to the network).
Detailed Answer:
Our analysis process is a sense making loop in which we extract
and visualize entities, discover linkages, and generate high level hypotheses.
The dependency graph speeds up this loop by automatically synchronizing data
and views, and propagating changes (e.g. assigning a new value to a variable)
to the whole structure.
Entity refinement and data cleaning
Automatic entity extraction is never 100% and CZSaw’s capabilities to
interactively aid entity extraction during the analysis process were used
extensively (e.g. merging misspelling of names or linking phone numbers to
individuals.)
Analysis Process
We began by clustering and categorizing documents based on content, refining
and cleaning as we went. We say two documents are related if they have common
entities. Based on this, the graph view (Figure 6) clusters groups of
documents.
Figure
6- Document network after initial automatic entity extraction. |
The document view enables reading individual documents. During the
reading and other parts of the analysis, we did entity merging and other
“cleaning”. CZSaw’s dependency graph and propagation system automatically
updates visualizations (Figure 7).
Figure
7- Document network after entity refinement. Documents are clustered, leading
us to examine each cluster. |
A Different View of the Social Network
Co-citation of people in a document shows their connection. People like
Dombrovski and Nicolai (Russia), Ahmed (Yemen), Borodinski (Ukraine), and
Otieno/Owiti (Kenya) play important roles in the network by connecting different
groups of people (Figure 8).
Figure
8- A different view of the social network. |
Several
clusters need study, along with possible connections between them. For the sake
of more readability and clarity in the graphs, we only check connections of
certain entity type(s) each time. We start with ‘country’ and ‘organization’
entities (Figure 9). The important countries are: Russia, Ukraine, North Korea,
Nigeria, Kenya, Venezuela, Columbia, Yemen, Saudi Arabia, UAE, Israel, Lebanon,
Pakistan, and Turkey.
Figure
9- Cluster of people connected by countries and organizations. |
Our next step is to check if any arm dealing happens between
clusters. There are some terms used in documents, especially in the
conversations, such as textbooks, pliers, etc. We hypothesize that these are
code names for illegal arms, and will refer to them as “equipment”. Figure
10 shows the equipment that is related to at least two people.
Figure
10- People connected with equipment and money entities. |
From Figure 10, some equipment is related to one group and
some connect separate clusters. For example, “farm”, connects the Turkey/Syria
group to the center farm group; and Celik acts as the connecting node. We can
also see two weapon shipments, the M/V Tanya, and the IL76 cargo plane that
connect a few clusters of people. Further investigation on Tanya shows that it
is an illegal arms shipment to Sudan, with Kenya acting as a middle country.
Next we show money and account entities in the graph view, and
track the flow of money in different bank accounts. The final destination of
the money wiring is an account that we hypothesize belongs to Dombrovski based
on other documents on relations between him and South America.
From reading some of the documents, we found out that there will
be meetings in Dubai, during a week starting on April 15th, 2009.
See Figure 4 in MC1.1 for the dates and the people involved in these
events. Only one of the related reports (USGovIntel-25) mentioned a travel to
Dubai on the 18th of April 2008, but according to the date that the document
was written and the coupling between the contents of this set of documents, we
assume that this is a mistake and the actual flight happens in April 2009.
An IL-76 air cargo that was carrying illegal weapons from North
Korea by arrangements of Borodinski (Ukraine) had stopped at UAE, planning to
go to Sri Lanka. The plane was scheduled to arrive at Iran on Feb 12th 2008,
but was seized in Thailand on Feb 11th.
There will be a set of meetings in Dubai, from April 15th,
2009 to April 22nd, 2009, between known or suspected arm dealers and
their customers. Some of the meetings will be held at the Burj Al Arab hotel.
The most important meetings and related people and events are summarized in
Figure 5 of MC1.1. Here, we briefly describe some of the important people and
their connections, based on both document text and our inferences. We assume
that there’s a group of main illegal arms suppliers, including people from
Russia, Ukraine and Thailand that have arranged the meetings in this time
period in Dubai.
George Ngoki (Nigeria) is involved in a deal with Mikhail
Dombrovski (Russia) for purchasing arms with a value of $30.6M.
Thabiti Otieno and his wife Nahid Owiti (Kenya) transport firearms
to Sudan through Kenya. These arms include the cargo ship Tanya from Nicolai
Kuryakin, a known arms dealer.
Arms dealers Boonmee Khemkhaeng (Thailand), Nicolai Kuryakin
(Russia) and Arkadi Borodinski (Ukraine) will meet.
Muhammad Kasem (head of MFJ) is buying arms for their “May
operation” from a Russian source that Abdllah Khouri has found.
Baltasar and his friends (Turkey and Syria) want to buy “textbooks
/ farm equipment” from Russia through a Bosnian salesman.
Azeem Bhutani and Maulana Haq Bukhari from Lashkar-e-Jhangvi
(Pakistan) are transferring money to an account in Moscow. They are flying to
Dubai during this time period and the money is probably for the payment of an
illegal arms deal.
Saleh Ahmed (a Yemeni arm dealer who supplies weapons to
neighbouring countries of Saudi Arabia) is going to meet Mikhail Dombrovski and
Nicolai Kuryakin, two major arm dealers from Russia.
Nicolai Kuryakin (Russia), Arkadi Borodinski (Ukraine) and Sattari
Khurshid (Iran) will meet. The first two are known arm dealers and the third
person has a history of working with Borodinski since the plane event.
Vwhombre also wants to buy “car parts” from Joe Tomski (Russia)
through Jhon (Colombia) and there’s a money transfer to an account in Moscow.
Dombrovski uses the email address Joetomsk@au.ru. We believe “jtomski” in this
car parts deal is Dombrovski.